Exploiting Minimal Resources for Subcategorization Frame Acquisition

نویسندگان

  • Katia Kermanidis
  • Manolis Maragoudakis
  • Nikos Fakotakis
چکیده

The d etection of the set of syntactic fram es, i.e. syntactic entities a certain verb su bcategorizes for, is im portant especially for tasks like parsing and gram mar d evelopm ent. Machine-readable d ictionaries listing su bcategorization fram es usually give only expected fram es rather than actual ones and are therefore incom plete, or not available for som e languages, inclu d ing Mod ern Greek (MG). By acquiring fram es au tomatically from corpora, these problem s are overcome altogether. Previou s w ork on learning fram es au tom atically from corpora focu ses mainly on English H eid (1996) w ork on Germ an w hile Basili et al. (1997) d eal w ith Italian and Zem an and Sarkar (2000) focu s on Czech. In m ost of the above approaches, the inpu t corpu s is fully parsed and , if not, only a limited number of frames are learned. As is the case for the m ajority of langu ages, a treebank or a w id e coverage syntactic parser are not yet available for MG. Constru cting a treebank is expensive and tim e-demanding. The au tomatic acquisition of su bcategorization inform ation by exploitation of as lim ited linguistic resources as possible appears to be very challenging. Contrary to English, w hich has a m ore or less fixed-ord er syntactic stru ctu re, in MG the position of the constitu ents of a sentence is a very w eak ind icator of their syntactic role. Morphology, on the other hand, is essential for determining verb-argument structure. Based on the above properties of the language, the environments of the verbs in the corpus are formed and

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Automatic Acquisition of Subcategorization Frames Using Bayesian Inference and Support Vector Machines

Learning Bayesian Belief Networks (BBN) from corpora and Support Vector Machines (SVM) have been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We are incorporating minimal linguistic resources, i.e. basic morphological tagging and phrase chunking, to demonstrate that verb subcategorization, which is of great significance for developing robust natural la...

متن کامل

Merging Lexicons for Higher Precision Subcategorization Frame Acquisition

We present a new method for increasing the precision of an automatically acquired subcategorization lexicon, by merging two resources produced using different parsers. Although both lexicons on their own have about the same accuracy, using only sentences on which the two parsers agree results in a lexicon with higher precision, without too great loss of recall. This “intersective” resource merg...

متن کامل

Bengali Verb Subcategorization Frame Acquisition - A Baseline Model

Acquisition of verb subcategorization frames is important as verbs generally take different types of relevant arguments associated with each phrase in a sentence in comparison to other parts of speech categories. This paper presents the acquisition of different subcategorization frames for a Bengali verb Kara (do). It generates compound verbs in Bengali when combined with various noun phrases. ...

متن کامل

The Automatic Acquisition Of Frequencies Of Verb Subcategorization Frames From Tagged Corpora

We describe a mechanism for automatically acquiring verb subcategorization frames and their frequencies in a large corpus. A tagged corpus is first partially parsed to identify noun phrases and then a finear grammar is used to estimate the appropriate subcategorization frame for each verb token in the corpus. In an experiment involving the identification of six fixed subcategorization frames, o...

متن کامل

Approaches to verb subcategorization for biomedicine

Information about verb subcategorization frames (SCFs) is important to many tasks in natural language processing (NLP) and, in turn, text mining. Biomedicine has a need for high-quality SCF lexicons to support the extraction of information from the biomedical literature, which helps biologists to take advantage of the latest biomedical knowledge despite the overwhelming growth of that literatur...

متن کامل

A Subcategorization Frames Acquisition System for French Verbs

This paper presents a system intended to automatically acquire subcategorization frames (SCFs) of verbs from the analysis of large corpora. The system has been applied to a newspaper corpus (made of 10 years of the French newspaper Le Monde) and acquired subcategorization information for 3267 verbs. 286 SCFs were dynamically learnt for these verbs. From the analysis of 25 representative verbs, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004